EMFET: E-mail Features Extraction Tool

نویسندگان

  • Wadi' Hijawi
  • Hossam Faris
  • Ja'far Alqatawna
  • Ibrahim Aljarah
  • Ala' M. Al-Zoubi
  • Maria Habib
چکیده

EMFET is an open source and flexible tool that can be used to extract a large number of features from any email corpus with emails saved in EML format. The extracted features can be categorized into three main groups: header features, payload (body) features, and attachment features. The purpose of the tool is to help practitioners and researchers to build datasets that can be used for training machine learning models for spam detection. So far, 140 features can be extracted using EMFET. EMFET is extensible and easy to use. The source code of EMFET is publicly available at GitHub (https://github.com/WadeaHijjawi/EmailFeaturesExtraction)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Genre Analysis of Reprint Request E-mails Written by EFL and Physics Professionals

The present study aimed to analyze reprint request e-mail messages written by postgraduates (MA students) of two fields of study, namely Physics and EFL, to realize the differences and similarities between the two email types. To investigate the purpose of the study, a sample of 100 e-mail messages, 50 Physics and 50 EFL, were analyzed according to Swales’ (1990) model for reprint requests and ...

متن کامل

Social Network Visualization for Forensic Investigation of E-mail

E-mail features as a key technology for both the dissemination of information and for social networking. Given the volume of e-mail transmission combined with access opportunities, it is not surprising that e-mails feature heavily during a digital forensics investigation. In these investigations, forensic examiners require an understanding of the social networks to which the suspect belongs for...

متن کامل

On Feature Extraction for Spam E-Mail Detection

Electronic mail is an important communication method for most computer users. Spam e-mails however consume bandwidth resource, fill-up server storage and are also a waste of time to tackle. The general way to label an e-mail as spam or non-spam is to set up a finite set of discriminative features and use a classifier for the detection. In most cases, the selection of such features is empiricall...

متن کامل

A Summary Sentence Extraction Method for Web-based Mailing List Review Application and Its Effectiveness Study

E-mail based communication is gradually making its way into the distant collaborative learning environment. But, compared with traditional lecture cum discussion learning environment in e-mail-based collaborative discussion, it is difficult to know the latest statuses of the learners for providing immediate feedback effectively due to limited information resources. The authors propose an inform...

متن کامل

SummaryBIFF: An E-mail Summarizer for Mobile Phones

We have developed SummaryBIFF, a new e-mail delivery system for mobile phones that sends a summary of each newly arrived message and the URL connected to the HTML file converted from the message. The summary is generated by our sentence extraction method that takes the features of e-mail messages for business use into consideration. Our method uses the features as cue phrases and modalities tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.08521  شماره 

صفحات  -

تاریخ انتشار 2017